Phenomics and transcriptomics analyses reveal deposition of suberin and lignin in the short fiber cell walls produced from a wild cotton species and two mutants

Fiber length is one of the major properties determining the quality and commercial value of cotton. To understand the mechanisms regulating fiber length, genetic variations of cotton species and mutants producing short fibers have been compared with cultivated cottons generating long and normal fibers. However, their phenomic variation other than fiber length has not been well characterized. Therefore, we compared physical and chemical properties of the short fibers with the long fibers. Fiber characteristics were compared in two sets: 1) wild diploid Gossypium raimondii Ulbrich (short fibers) with cultivated diploid G. arboreum L and tetraploid G. hirsutum L. (long fibers); 2) G. hirsutum short fiber mutants, Ligon-lintless 1 (Li1) and 2 (Li2) with their near isogenic line (NIL), DP-5690 (long fibers). Chemical analyses showed that the short fibers commonly consisted of greater non-cellulosic components, including lignin and suberin, than the long fibers. Transcriptomic analyses also identified up-regulation of the genes related to suberin and lignin biosynthesis in the short fibers. Our results may provide insight on how high levels of suberin and lignin in cell walls can affect cotton fiber length. The approaches combining phenomic and transcriptomic analyses of multiple sets of cotton fibers sharing a common phenotype would facilitate identifying genes and common pathways that significantly influence cotton fiber properties.


Introduction
Cotton (Gossypium sp.) is the most economically important natural fiber in the world [1]. In addition to the agronomic importance, cotton fibers are also utilized as an ideal biological model for studying molecular mechanisms involved in cell elongation and cell wall biogenesis because cotton fiber cells are unicellular and larger and longer than any other plant cell [2]. Cotton fiber development is divided into four overlapping stages: 1) initiation, 2) primary cell wall (PCW) biosynthesis characterized by fiber elongation, 3 1 , and Li 2 ) were planted on two-row plots located at the Southern Regional Research Center (New Orleans, LA; 2017) with naturally neutral-day conditions. The soil type of the cotton plot was aquents dredged over alluvium in an elevated location to provide adequate drainage. Single row plots were 12 m long with approximately 40 plants per plot. The distance between two rows was 0.5 m, and the distance between two plants within a row was 0.3 m. To minimize environmental effects, boll samples were not collected from plants on the perimeter of the field and the end of each row. At harvest, approximately 60 naturally opened bolls were randomly collected from two plots for each cotton variety, and separated into two biological replicates with 30 bolls per biological replicate for further analyzing physical and chemical properties of each cotton variety.
To collect developing fibers at various developmental stages, wild diploid G. raimondii D 5 -6 and D 5 -31 along with G. arboreum and G. hirsutum were grown in a growth chamber (Percival Intellus Environmental Controller, Perry, IA) in 8 L pots at 28˚C (day) / 24˚C (night) with a short photoperiod condition (9h day light, 300 μmolm -2 s -1 ) during the vegetative stage, and reduced to 26˚C (day)/ 18˚C (night) during flowering and boll development stages. The pots were filled with Metro-Mix 350 soil. For fiber length measurement, two plants of wild diploid G. raimondii D 5 -6 were grown in 167 L containers at an NCGC greenhouse located at College Station, Texas during a winter season for a short photoperiod condition. The two G. raimondii D 5 -6 plants produced four bolls, and separated into two biological replicates with two bolls per biological replicate. To obtain sufficient G. raimondii fibers for fiber length and chemical analyses, G. raimondii D 5 -31 was grown perennially at the cotton winter nursery at Tecoman, Colima, Mexico in association with the location of the Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias [26]. Three G. raimondii D 5 -31 plants (240 days after planting) were transplanted on the ground of the cotton winter nursery. In the second year, they produced 400 bolls that were separated into two biological replicates with 200 bolls per biological replicate for further analyzing fiber length and chemical analyses. All G. raimondii grown in the growth chamber, greenhouse, and cotton winter nursery produced a common phenotype demonstrating short and green colored fibers.

Cotton fiber length measurements
Maximum fiber lengths were estimated by placing ovules on a watch-glass and gently spraying fibers with a stream of distilled water as described by Schubert et al. [27]. Ten to thirty cotton bolls were randomly selected from each biological replicate samples of G. arboreum (A 2 -100 and SXY1), G. raimondii (D 5 -6 and D 5 -31) and G. hirsutum (TM-1, SG-747, DP-5690, Li 1 , and Li 2 ). Single cotton seeds were randomly selected from an individual cotton boll. The distance between the chalazal end of the selected seeds and the tip of the spread fibers were measured to the nearest 0.1 mm with a digital caliper. Mean maximum fiber length of each cotton variety was obtained by measuring the randomly selected seeds from two biological replicates.

Updegraff cellulose assay
Cellulose contents of developed cotton fibers were measured by the modified Updegraff method [28]. Five cotton bolls were randomly selected from each biological replicate samples of the cultivated G. arboreum (A 2 -100 and SXY1) and G. hirsutum (TM-1, SG-747, and DP-5690) producing long fibers. Two to six cotton bolls were also randomly selected from each biological replicate samples of the G. raimondii (D 5 -6 and D 5 -31) and G. hirsutum mutants (Li 1 and Li 2 ) generating short fibers. Dried fiber samples of the selected bolls were manually harvested, and cut into small pieces. Ten milligrams of the blended fibers were placed in 5 mL Reacti-Vials TM (Thermol Fisher Scientific, Waltham, MA), and hydrolyzed with acetic-nitric reagent (a mixture of 73% acetic acid, 9% nitric acid and 18% water). The remaining cellulose was hydrolyzed with 67% sulfuric acid (v/v) and measured by a colorimetric assay with anthrone with Avicel PH-101 (FMC, Rockland, ME, USA) as a cellulose standard. Mean cellulose content of each cotton variety was obtained by measuring the randomly selected cotton bolls from two biological replicates.

Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) spectral collection and data analysis
Five cotton bolls were randomly selected from each biological replicate samples of cultivated G. arboreum A 2 -100 and G. hirsutum TM-1 and DP-5690 producing long fibers as well as wild G. raimondii D 5 -31 and G. hirsutum Li 1 and Li 2 mutants generating short fibers. Dried fiber samples were manually harvested from the selected bolls, and divided into six portions that were directly scanned without further processing. Average spectra of each replicate samples were obtained from the spectra from the six portions of the sample. Mean spectra of each cotton variety were obtained by measuring the randomly selected cotton bolls from two biological replicates. All samples were scanned by an FTS 3000MX FT-IR spectrometer (Varian Instruments, Randolph, MA, USA) equipped with a ceramic source, KBr beam splitter, and deuterated triglycine sulfate (DTGS) detector and attenuated total reflection (ATR) attachment according to the methods that were previously described in Liu and Kim [29]. The spectra were normalized by dividing the intensity of an individual band in the 1800-600 cm -1 region with the average intensity in that 1800-600 cm -1 region, and subsequent principal component analysis (PCA) characterization was performed in the 3000-1200 cm -1 IR region with mean centering (MC), multiplicative scatter correction (MSC), and Savitzky-Golay first-derivative (13 points) spectral pretreatment and with leave-one-out cross-validation method.

Pyrolysis-molecular beam mass spectrometry lignin analysis
Five cotton bolls were randomly selected from each replicate samples of the cultivated G. arboreum A 2 -100 and G. hirsutum TM-1 as well as wild G. raimondii D 5 -31. Two replicated samples of the dried cotton fibers of G. raimondii D 5 -31, G. arboreum A 2 -100, and G. hirsutum TM-1 were cut in a Wiley mill into 20 mesh. Average contents of each cotton variety were obtained by measuring the randomly selected cotton bolls from two biological replicates. Lignin analysis was performed with pyrolysis molecular beam mass spectroscopy (pr-MBMS) by Complex Carbohydrate Research Center (CCRC) at University of Georgia. Duplicated cotton samples along with control samples including NIST 8492 (lignin content, 26.2%) and aspen standards were pyrolyzed at 500˚C and the volatile compounds were analyzed for lignin using a molecular beam mass spectrometer (Extrel Core Mass Spectrometers). The raw data were processed through UnscramblerX 10.1 software to obtain the principal components and raw lignin data. G. arboreum A 2 -100 fibers exclusively composed of cellulose (95.6~100%) with the lowest lignin level among cotton species was also used as the lignin base line for all tested cotton samples.

Transcriptomic analyses
RNA-seq reads for the cotton materials shown in Table 1 were retrieved from the NCBI SRA database. These reads were aligned to the JGI G. raimondii reference genome [10] using gsnap software and reads that mapped to annotated genes were counted using bedtools software [30,31]. RNA-seq expression analysis was conducted following the PolyCat pipeline as previously described [24,32]. Briefly, all reads were aligned to the JGI G. raimondii reference genome, then the PolyCat software assigned each categorizable read to either the A T or D T subgenome based on an index of homeoSNPs. Using the retrieved RNA-seq reads (Table 1 and S1 Table), RPKM (reads per kilobase of transcript per million reads mapped) numbers were determined, and specifically or differentially expressed genes in developing D 5 , Li 1 or Li 2 fibers at elongating PCW or wall-thickening SCW stage were identified and annotated based on the best hit by BLAST search with The Arabidopsis Information Resource version 10 (TAIR 10). The GO enrichment analysis was performed using agriGO v2.0 Singular Enrichment Analysis [33].
Suberin and lignin depositions in the polyploid G. hirsutum Li 1 and Li 2 mutant fibers among the 2 nd set of cotton materials. The unginned Li 1 and Li 2 fibers as well as their NIL fibers appear to be white ( Fig 1B). On the contrary, the ginned Li 1 and Li 2 fibers showed a brown color that was visually different from their NIL fibers showing a white color ( Fig 5A). Consistently, the IR spectra of Li 1 and Li 2 fibers were also different from those of DP-5690 fibers ( Fig 5B). Multiple IR spectral peaks assigned as suberin (1738, 2850, and 2920 cm -1 ) and lignin (1705-1720 cm -1 ) components were specifically detected in the Li 1 and Li 2 short fibers ( Fig 5B). In addition, a bigger bulge area (1580~1640 cm -1 ) of IR spectrum was detected from the short mutant fibers as compared with the NIL fibers. The bulge area may be composed of four close IR peaks (1588, 1606, 1624, and 1635 cm -1 ) that were assigned as suberin fractions from other plants [37,41].

Transcriptomic profiles of developing cotton fibers
Classification of developmental stages of the 1 st set of cotton fibers retrieved from the original RNA-seq analyses. As summarized in Table 1, RNA-seq data of developing G. raimondii, G. arboreum, and G. hirsutum fibers at 10, 20, or 28 DPAs were available from a public database. Because they are distinct species and grown in different environments [34-36], a closer examination of the RNA-seq data was needed to better align overall expression profiles and the fiber growth stages for more meaningful comparisons. The fiber developmental stages

PLOS ONE
of the three cotton species used in the original research [34-36] were first classified by monitoring the transcript abundance and patterns of the indicator genes including fasciclin-like arabinogalactan [43] and expansin [44] that are specifically up-regulated at PCW stage as well as cellulose synthase (CesA) genes that are specifically up-regulated at SCW stage [45]. Three Fasciclin-like arabinogalactan genes and four expansin genes were commonly up-regulated in the 10 DPA fibers of all three cotton species ( Table 2), suggesting that developing G. raimondii, G.  Table 2. Classifications of cotton fiber developmental stages of G. raimondii, G. arboreum, and G. hirsutum. Ten indicator genes are shown for each species according to genome type and at 10, 20, or 28 DPA. The most up-regulated RPKM numbers of each indicator genes in the cotton species were written in bold fonts.

A 2 -10 A 2 -20 D 5 -10 D 5 -20 A T -10 A T -20 A T -28 D T -10 D T -20 D T -28
Gorai  (18,333 EGs) and SCW (17,520 EGs) stages were also similar to those in the G. hirsutum A T subgenome at PCW (16,188 EGs), and SCW (17,472 EGs) stage. In this study, the focus was on identification of genes specifically expressed in developing G. raimondii fibers, but not expressed (zero RPKM) in developing fibers of the other cotton species. Transcriptomic profiles between the two diploid cotton species identified specifically expressed genes (SEGs) in developing G. raimondii fibers at PCW (2,663 SEGs) and SCW (3,193 SEGs) stage as shown in Fig 6A. When the transcriptomic profiles of the three cotton species compared, 1,385 and 1,520 SEGs were identified from developing G. raimondii at PCW and SCW stages, respectively ( Fig 6A and S3 Table). Between PCW and SCW stages of developing G. raimondii fibers, 297 SEGs were overlappingly expressed (Fig 6B).
Transcriptomic analyses of the 2 nd set consisting of G. hirsutum NILs differing in fiber length. The original RNA-seq analysis of the short fiber mutants (Li 1 and Li 2 ) was performed with total RNAs extracted from developing fibers at PCW stage (8-12 DPA) grown in greenhouse or cotton fields [22,24]. To identify differentially expressed genes (DEGs), developing fibers from field grown Li 1 and Li 2 fibers were compared to their NIL, G. hirsutum DP-5690, using a 2-fold difference as a threshold. In the Li 1 mutant fibers, 4,043 genes were up-regulated whereas 2,536 genes were down-regulated (Fig 7A). In the Li 2 mutant fibers, 2,419 genes were up-regulated, whereas 1,740 genes were down-regulated (Fig 7A). Identification of candidate genes producing the color pigments in the two mutant fibers focused on the 1,285 genes (S5 Table) that were commonly up-regulated in the developing Li 1 and Li 2 fibers (Fig 7B).
GO enrichment analysis of the 1,285 UGs identified six GO categories ( Table 4). The two GO categories including transporter activity (GO:0005215) and cellular respiration (GO:0045333) were also previously identified in the original analysis [24] by MapMan ontology [51]. Among the newly identified four GO categories, ADP binding (GO:0043531) and protein kinase activity (GO:0004672) composed of multiple nucleotide-binding leucine-rich repeat receptors (NLRs) and leucine-rich repeats receptor-like kinase (LRR-RLK) are involved in plant development and stress responses in other plants [52]. The other two GO categories, tetrapyrrole binding (GO:0046906) and response to endogenous stimulus (GO:0009719), were also over-represented in wild diploid G. raimondii fibers (Tables 3 and 4).

Integration of the chemical phenotypes and transcriptomic profiles between the two sets of cotton materials differing in fiber lengths and cellulose contents
For an examination of the quantitative and statistical significances of the spectral features showing different chemical components among the fiber samples used in the 1 st and 2 nd sets, a principal component analysis (PCA) was performed with the spectral region (1200-3000 cm -1 ) composed of IR peak bands of suberin, lignin, and cellulose ( Fig 8A). The analysis showed a dominant first principal component (PC1) accounting for 75.9% of the total variation, and revealed a distinction in PC1 score within the six tested samples (Fig 8A). The PC1 score increased in the order of G. hirsutum Li 2 < G. hirsutum Li 1 < G. raimondii D 5 -31 < G. hirsutum TM-1 � G. hirsutum DP-5690 � G. arboreum A 2 -100. The three cultivated cottons of G. arboreum   Table 3. GO enrichment analyses of specifically expressed genes in PCW or SCW stage of developing G. raimondii (D 5 ) fibers. List and annotation of genes are described in S3

Common characteristics of physical properties of cotton fibers from wild diploid G. raimondii and polyploid G. hirsutum Li 1 and Li 2 mutants
Physical properties of cultivated cotton fibers are generally assessed by a High Volume Instrument (HVI) which is defined by the International Cotton Advisory Committee as a standardized instrument for cotton fiber quality measurements [7]. Cotton fibers with a length less than 12.7 mm are classified as short fibers that reduce the quality of spun yarns. The cotton fibers produced by G. raimondii and G. hirsutum Li 1 and Li 2 mutants were too short to be measured by HVI. Thus, we manually measured the maximum fiber lengths of the wet and relaxed cotton fibers from the chalezel end of cottonseeds [27]. The maximum fiber length of G. raimondii D 5 -6 (11.7 mm) and There are two different types of G. hirsutum fibers. Lint fibers differentiate from ovule epidermis on the day of anthesis and they grow approximately 25~35 mm based on HVI measurements. In contrast, linter or fuzz differentiate from the ovule epidermis around 5 to 10 DPA and they do not grow longer than 15 mm [68]. Wild diploid G. raimondii was often described as a lintless, non-fibered, or fiberless species [12,69,70]. The full names of the Li 1 and Li 2 mutants also contain "lintless" [17,18]. However, the fiber initials of G. raimondii [8], G. hirsutum Li 1 mutant [19], and G. hirsutum Li 2 mutant [20] all differentiate on the day of anthesis. Thus, the short fibers produced from G. raimondii and G. hirsutum Li 1 and Li 2 mutants can be classified as lint fibers according to the definition described by Lang [68].
In this study, we showed that the three short cottons of wild G. raimondii and two G. hirsutum mutants produced fibers containing color pigments composed of lignin and suberin (Figs 4A and 5A). The green coloration of G. raimondii fibers was reported by Hutchinson and his colleagues in 1947 [69], but its color pigment was not further characterized. A recent study showed that NIR spectra of G. raimondii fibers were similar to those measured from naturally green colored G. hirsutum fibers [71]. Despite the extensive studies of Li 1 and Li 2 mutant fibers, the brown color pigments of the short fiber mutants (Fig 5A) have been unnoticed. Green color of the cotton fibers is faded to tan color when they are exposed to light [72]. The light brown color of the short mutant fibers might have been overlooked because their NIL, DP-5690, produces white lint fibers.

Common chemical components among diploid G. raimondii and polyploid G. hirsutum mutants
Chemical analyses using cellulose assay, ATR FT-IR spectroscopy, and mass spectrometry consistently showed suberin and lignin components in the three short fibers. Average cellulose  [71] and functional divergence of cellulose synthase orthologs between wild G. raimondii and cultivated G. arboreum [45].
The three short cottons of G. raimondii and G. hirsutum Li 1 and Li 2 mutants demonstrated the signature IR spectral peaks of suberin and lignin (Figs 4B and 5B). Another type of short fiber mutant li y also showed the signature IR spectral peaks of suberin [73]. In naturally green colored G. hirsutum, suberin layers were observed in the secondary cell wall in cotton fibers [72,74], and a major lignin precursor and its derivatives were deposited in the suberin layers [75]. Suberin and lignin can be produced with common precursors, i.e. phenolic components [76]. In contrast to suberin consisting of phenolics and aromatic polymers, lignin is purely composed of poly-aromatic components [76]. Generally, lignin is derived from three phenylpropanoid monomers, the monolignols 4-coumaryl, coniferyl, and sinapyl alcohols, that produce the 4-hydroxyphenyl (H), guaiacyl (G), and syringyl (S) units in the polymer [77]. Our mass spectrometry lignin analysis showed significantly greater content of the S lignin in wild G. raimondii fiber (1.8%) than those of the other cultivated cotton fibers (0~0.6%). Recent studies suggest that lignin may play an important role in cotton fiber quality [78,79].
The integrative PCA method of the two sets enabled classifying the six cotton fibers into two classes according to the PC1 scores ( Fig 8A). All three cultivated cottons shared similar positive PC1 scores without any significant variation, whereas all three short cottons showed negative PC1 scores with significant and broad variation. During cotton fiber development, underdeveloped cotton fibers containing high levels of non-cellulosic components show negative PC1 scores [80]. As cellulose content increases during normal fiber development, PC1 scores also increase and become positive [81]. Notably, the pattern of the PC1 score in Fig 8A was consistent with previous reports that PC1 scores increased with the cellulose content during cotton fiber development [45]. The IR bulge area likely represents a macromolecule complex composed of suberized components whose IR signals can overlap. Interestingly, there were noticeable IR peak bands of the bulge area among G. hirsutum Li 1 (1623 cm -1 ), G. hirsutum Li 2 (1610 cm -1 ), and G. raimondii D5 (1635 cm -1 ) (Figs 4B and 5B). These results along with the significant variation of their PC1 scores showed variation in the three short cottons although they shared suberin and lignin components (Figs 4B, 5B and 8A).

Commonly up-regulated orthologs among diploid G. raimondii and polyploid G. hirsutum mutants
To test if suberin and lignin genes were specifically up-regulated in developing G. raimondii fibers, we used the RNA-seq data performed with the RNAs extracted from developing fibers of G. raimondii, G. arboreum, and G. hirsutum (Table 1) [34-36]. The original transcriptomic analyses of the short fiber mutants (Li 1 and Li 2 ) and their NIL DP-5690 were only performed with total RNAs extracted from developing fibers at PCW stage [22,24]. Thus, we verified that suberin and lignin were specifically detected at the PCW stage of developing mutant fibers (S1 Fig). The short fiber phenotypes of G. raimondii fibers [70,71] and mutants [19,20,22,24] were consistent across various growing conditions. In this study, we used the JGI G. raimondii D 5 reference genome for analyzing transcriptomic profiles of the two sets because the G. raimondii genome sequence shows high homology (>96%) with the coding sequences of G. hirsutum A T and D T subgenomes. Thus, the D 5 reference genome sequence has been successfully used for characterizing the Li 1 and Li 2 genomes by several groups [19,20,22,24,82]. Transcriptomic analysis of the 1 st set identified that genes involved in suberin and lignin biosynthesis were specifically expressed in G. raimondii fibers (Fig 6, Table 3 and S3 Table). Among them, glycerol-3-phosphate acyltransferase 1 (GPAT), cytochrome P450 family genes, and laccases were reported to be involved in suberin or lignin biosynthesis in other plants [46,47,53,54,76]. Among the four GO categories over-represented in G. raimondii fibers (Table 3), Omethyltransferase activity is essential for biosynthesis of lignin, suberin and flavonoids [49,83]. Integrative analyses of the two sets identified 29 genes that were commonly up-regulated in wild cotton species and short fiber mutant fibers (Table 5). Of the 22 annotated genes, four genes such as laccase, peroxidase, ABC-2 type transporter, and JAZ8 are involved in biosynthetic processes of lignin, suberin, and their derivatives [47,[54][55][56]84]. The other 18 annotated genes are reported to be related to stress responses ( Table 5). The mutations of an actin [25] and a putative Ran Binding Protein 1 [23] cause the short fiber phenotypes of the Li 1 and Li 2 mutant respectively, and also up-regulate the genes involved in stress responses including lignin, suberin and flavonoid biosynthesis ( Table 5).
Lignin deposition was suggested to reduce the extensibility of expanding fiber cell walls [79]. Suberin has been reported to be a major regulator of water and solute transport, and a pathogen barrier in plant cell walls [76]. A recent functional study of Arabidopsis mutants altered in suberin deposition clearly showed the reductions of the apoplastic transport of water and ions [85]. Hydrophobic suberin in the cotton fiber cell walls also negatively affect apoplastic transport activities in cotton fibers [72,74,86].

Conclusion
Here, we used both phenotypic and transcriptomic analyses for identifying common mechanisms reducing fiber elongation in the short fibers generated from G. hirsutum Li 1 and Li 2 mutants as well as wild G. raimondii. Chemical analyses identified a common deposition of suberin and lignin in the short fiber cell walls. The genes involved in suberin and lignin biosynthesis were also commonly up-regulated in the elongating cotton fibers of the three short cottons as compared with the cultivated and long G. arboreum and G. hirsutum fibers. These results support a notion that suberin and lignin deposition may affect cotton fiber elongation process negatively. They also provide insight on how suberin and lignin biosynthesis can affect fiber length and cellulose productions in wild and cultivated cotton species.